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Abstract 

Background: Genetic interactions pervade every aspect of biology, from evolutionary theory, where they 
determine the accessibility of evolutionary paths, to medicine, where they can contribute to complex genetic 
diseases. Until very recently, studies on epistatic interactions have been based on a handful of mutations, providing 
at best anecdotal evidence about the frequency and the typical strength of genetic interactions. In this study, we 
analyze a publicly available dataset that contains the growth rates of over five million double knockout mutants of 
the yeast Saccharomyces cerevisiae. 

Results: We discuss a geometric definition of epistasis that reveals a simple and surprisingly weak scaling law for 
the characteristic strength of genetic interactions as a function of the effects of the mutations being combined. 
We then utilized this scaling to quantify the roughness of naturally occurring fitness landscapes. Finally, we show 
how the observed roughness differs from what is predicted by Fisher's geometric model of epistasis, and discuss 
the consequences for evolutionary dynamics. 

Conclusions: Although epistatic interactions between specific genes remain largely unpredictable, the statistical 
properties of an ensemble of interactions can display conspicuous regularities and be described by simple 
mathematical laws. By exploiting the amount of data produced by modern high-throughput techniques, it is now 
possible to thoroughly test the predictions of theoretical models of genetic interactions and to build informed 
computational models of evolution on realistic fitness landscapes. 
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Background 

Genetic interactions [1] have shaped the evolutionary 
history of life on earth. They have been found to limit 
the accessibility of evolutionary paths [2], to confine 
populations to suboptimal evolutionary states and, on 
larger time scales, to control the rate of speciation [3]. 
Epistatic interactions can also be relevant to the devel- 
opment of complex human diseases such as diabetes [4]. 
Complex traits and diseases are determined by a multi- 
plicity of genomic loci [5], whose independent effects 
and interactions [6] are often necessary to understand 
the phenotype of interest. Despite the broad implica- 
tions of epistatic interactions, a quantitative characteri- 
zation of their typical strength is still lacking. In this 
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study, we consider growth rate in yeast as an example of 
a complex trait modulated by genetic interactions. 

Previous studies [7-10] on the relation between the 
growth effects of a mutation and its epistatic interactions 
have often been based on a handful of mutations, and 
only in recent years has anecdotal evidence started being 
replaced by robust statements based on large data sets. 
Perhaps the most impressive of these datasets is the one 
made publicly available with the publication of the article 
entitled 'The genetic landscape of a cell' by Costanzo 
et al. [11]. The genome of the budding yeast Saccharo- 
myce cerevisiae includes approximately 6,000 genes, 
about 1,000 of which are essential. Viable mutants can be 
constructed by knocking out any of the approximately 
5,000 non-essential genes, by reducing the expression of 
the essential genes, or by partially compromising the 
functionality of the gene products. The dataset (see Addi- 
tional file 1, Figure SI) has been compiled with the 
growth rates of about 5.4 million double knockout 
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mutants, a sizable fraction of all possible double knock- 
out mutants in yeast. Supported by the Costanzo et al. 
dataset, we consider the fundamental question of 
whether mutations with larger effects have stronger 
genetic interactions. 

Results and discussion 

An unbiased definition of genetic interactions 

A basic approach to study genetic interactions is to con- 
sider two mutations with known effects on a quantita- 
tive trait, and to measure their combined effect in the 
double mutant [12]. Given [11,13] the growth rates of a 
wild type S. cerevisiae strain (g 0 o = 1) and of two single 
knockout mutants (goi and gio), the growth rate of the 
double knockout mutant (g n ) is adequately predicted by 
a multiplicative null model: 

gn/goo = (goi/Soo) (gio/Soo) • 

Equivalently, defining 'log growth' as the logarithm of 
the relative growth rate, 

G = log 2 (g/gno) , 

the log growth of the double knockout mutant is pre- 
dicted by an additive null model (Figure la): 



Gn = Goi + Gio- 

Epistatic interactions are identified as deviations from 
the null model, but several non-equivalent alternatives 
exist for quantifying these deviations [14]. The most 
common definition of epistasis considers the difference 
between the measured and the predicted growth rates 
for the double knockout mutant [11]: 



gn 
goo 



goi gio 
goo goo 



Importantly, this definition of e subtly constrains the 
possible values of epistasis. In fact, when combining 
very deleterious mutations, e cannot be large and nega- 
tive even when the double knockout mutant is a syn- 
thetic lethal mutant: 

e = 0 - (goi/goo) (Bio/Boo) w °< 
if goi <<goo and gio <<goo- 

In order to avoid a priori constraints on the intensity 
of epistasis, genetic interactions can be defined as the 
ratio between the measured and predicted relative 
growth rates, leading to: 

c i gll , gOl , glO 

E = log 2 log 2 log 2 — • 

goo goo goo 
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Figure 1 The log growth rates of two mutations combine additively. (a) The average effect of a double knockout (Gn) as a function of the 
effects of the single knockouts (G 01 and G 10 ) is G qq = G oq + G q0 . Experimental mean +/- standard deviation (blue line and blue shaded area) and 
prediction of the additive null model (red line), (b) Given two mutations, there are four possible mutants with their corresponding log growth 
rates (black dots). If three of the four log growth rates are known, the fourth one can be predicted by a linear extrapolation (red plane), and 
epistasis can be defined as the linear deviation from such prediction (red arrow). The magnitude of the deviation is the same regardless of 
which three of four mutants are chosen. 
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As an example, E = +1 indicates a double mutant 
whose growth rate is twice as large as would be 
expected based upon the multiplicative null model, 
whereas E = -1 indicates a double mutant whose growth 
rate is half as large as predicted. This definition of epis- 
tasis as fold deviation in the multiplicative model for 
growth rates is equivalent to a natural definition of epis- 
tasis as linear deviation in the additive model for log 
growth rates (Figure lb): 

E = (Goo + Gu) — (Goi + Gio) = (Gn — Goi — Gio). 

A second bias of the common definition of epistasis is 
that e depends on the choice of which genotype is 
labeled as 'wild type' or '00', a choice which is always 
arbitrary, but more obviously so when studying engi- 
neered organisms or populations evolving in alternating 
environments [15]. By contrast, 

|£| = |(Goo + G n )-(Goi+Gio)| 

depends only on which pair of genes is considered, 
being a geometric measure for the 'curvature' of the fit- 
ness landscape (Figure lb). 

The definition of E has found some favor in the theo- 
retical literature [7,16], but it is not routinely used to 
analyze experimental data apart from rare exceptions 
[8,17]. Its main drawback is that synthetic lethals have a 
log growth rate of and require a separate although 
simpler analysis in which lethal interactions can simply 
be counted. The definition of E proves instead to be 
extremely valuable when quantifying the strength of 
non-lethal genetic interactions. 

Epistatic interactions scale weakly with mutational effects 

With the appropriate definition of epistasis, a simple 
relation between the growth rate effects of two muta- 
tions and the expected strength of their interaction 
emerges. 

Let us consider two groups of mutations; in the first 
group, all mutations have log growth effect G 0 i, and in 
the second group, all mutations have log growth effect 
Gio- We can then build all possible double mutants 
obtained by combining one mutation from each group. 
In the absence of epistasis, all the double mutants have 
a log growth rate 

Gu = Goi + Gio, 

and the distribution of genetic interactions is sharply 
peaked at E = 0. When epistasis is present, the distribution 
of genetic interactions has, in general, non-zero mean and 
standard deviation. Experimentally, however, the mean of 
genetic interactions is close to zero (this is why the null 
model remains approximately valid) (Figure la; Figure 2d). 
Even when the mean interaction is vanishing, the difference 



between the experimental dataset and the ideal case with- 
out interactions can be quantified by the finite value of the 
experimental standard deviation a(G 0 i, Gio), which pro- 
vides a numerical estimate for the characteristic strength of 
epistatic interactions. 

In order to produce reliable numerical results, thou- 
sands of growth rates are necessary to characterize the 
probability distribution of epistasis. We analyzed the 
Costanzo et al. dataset by binning pairs of mutations 
according to the log growth effects of their single 
knockouts G 0 i and Gi 0 , using the method described 
above to outline the probability distribution of epistasis. 
We chose bin sizes that grow exponentially with G in 
order to ensure an approximately constant number of 
data points in each bin (see Materials and Methods; see 
Additional file 1, Figure S2). Most bins contain from 
thousands to tens of thousands of data points. For each 
bin, we computed 

var(£(G 0 i, Gio)), 

that is, the variance of the random variable E relative 
to the bin labeled by growth rates Goi and Gio- In the 
rest of the paper we will refer to such variance as var 
(G 0 i, Gio), emphasizing that the variance in the 
strength of epistatic interactions is, eventually, a func- 
tion of G 0 i and Gi 0 (Figure 2a). The square root of the 
variance, a(G 0 i, Gi 0 ), then represents the expected 
strength of epistasis as a function of the independently 
varying effects of the two single knockouts. A natural 
expectation for the dependence of epistasis on the 
effect of the combined mutations comes from rescaling 
Figure la; if all the log growth effects of single and 
double knockouts increase by a factor of two, then the 
strength of epistasis should also increase by a factor of 
two. Unexpectedly, however, when combining deleter- 
ious mutations, the strength of epistatic interactions 
does grow with the effects of the mutations that are 
combined, but the dependence is much weaker; when 
the effect of both single knockouts is doubled, the 
strength of epistasis increases only by a factor of V2 
(Figure 2). 

In more detail, we observed that if the effect of the 
first knockout (G 0 i) is held constant, the dependence of 
the variance of epistasis on the effect of the second 
knockout (Gi 0 ) is well approximated by a Michaelis- 
Menten law (Figure 2b): 



When the effects of both knockouts are free to vary, 
the requirement that the variance is a symmetric function 
of its two variables, G 0 i and Gi 0 , implies that K = |G 0 i| 
and that v is proportional to G 0 i- A one-parameter 
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Figure 2 The strength of epistatic interactions scales with the log growth effects of the interacting knockouts, (a) Each dot represents 
the variance of several thousand epistatic interactions binned according to the log growth effects of the two single knockouts, G OT and G 10 . The 
blue surface is the phenomenological fit: var (Got, Gio) = 0.079 X 2 |Goi I I Gin I / (|Gni I + |Gio|) • (W Slices of the plot in (a) for G 01 
= constant. The dots are the same as in (a), and the solid lines represent the corresponding slice of the one-parameter fitting surface, (c) 
Diagonal slice of the plot in (a) with finer bins (G 01 = G 10 within 20%, G = mean(G oq , G 10 )). The blue shaded area is the 25 to 75% confidence 
interval computed by bootstrap; the red line (var(G, G) = 0.079 G) is computed from the phenomenological model, and the dashed gray line, for 
which var(G, G) is proportional to G 2 , represents the lower bound to the slope predicted by the Fisher's geometric model, (c, inset) The epistatic 
interactions between beneficial mutations are vanishingly small, independently of the effect of the combined mutations, (d) Probability density 
functions p(E') for the strength of genetic interactions between two deleterious knockouts with similar log growth effects. Different colors 
correspond to knockouts with different effects: the growth rates effects of the single knockouts being combined are close to -38% (red), -22% 
(yellow), -12% (green), -6% (blue), and -3% (purple). Each curve has been rescaled so that all distributions have a standard deviation = 1. The left 
tail of the distributions displays a fat tail, describing the occurrence of strong negative genetic interactions (for comparison, the dashed-dotted 
black line is a normal distribution). 



function which fits the seen variance over the whole 
range of deleterious fitness effects (Figure 2a) is then: 



var(G 0 i,Gio) = 2c 



IGniGiol 
I Goi| + |Gio| 



-with c = 0.079. 



This functional form can also be obtained from a sim- 
ple model based on diffusion in fitness space (see Addi- 
tional file 1, Supplementary text 1). An even simpler 
phenomenological fit, although slightly less accurate, is: 

var (G 0 i,G 10 ) = cV(|Goi||Giol) 



(see Additional file 1, Figure S3). Importantly, these 
functions capture two major features of the data; first, 
epistasis vanishes when G 0 i or Gi 0 = 0; second, when the 
effects of the two knockouts are similar (G 0 i = G 10 = G 
along the diagonal of the surface in Figure 2a), the 
variance of epistasis is approximately proportional to G 
(Figure 2c): 

var(G 0 i, Gio) = c|G| . 

The scaling described above is seen only for deleter- 
ious knockouts. When combining the beneficial 
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knockouts in the dataset instead, the strength of epista- 
sis is close to zero (Figure 2c, inset). This might be 
because the slightly beneficial knockouts are not adap- 
tive mutations, but simply remove genes that are not 
needed in the conditions chosen for the experiment, so 
that their interactions are likely to be negligible. How- 
ever, in apparent contrast to this observation, recent 
studies [8,18] on adaptive mutations in Escherichia coli 
suggest that genetic interactions between adaptive muta- 
tions are mostly negative. In fact, during adaptation, the 
prevalence of negative interactions is likely to be caused 
by biased sampling, because the mutations that fix in 
the population are likely to be the ones that solve envir- 
onmental or biological challenges for an organism. 
Diminishing returns arise because the appearance of 
multiple 'solutions' to the same challenge is not necessa- 
rily preferable over the presence of a single solution. 
Rather than focusing on mutations that fix during a 
bout of adaptation, the Costanzo et al. dataset includes 
a large fraction of all possible pairs of genes in the yeast 
genome. Because for most pairs the two genes are 
involved in unrelated biological processes, interactions 
are often vanishingly small. We did observe, however, 
that the distribution of epistatic interactions is asym- 
metric, with a heavy tail of deleterious interactions 
(Figure 2d). 

Experimental uncertainty generates spurious epistatic 
interactions 

When inferring genetic interactions from experimental 
data, it is important to take into account that each mea- 
sured growth rate is affected by some uncertainty, and 
that measurement errors in the growth rates could erro- 
neously be interpreted as genetic interactions. Impor- 
tantly, for each single and double mutant, the Costanzo 
et al. dataset provides the mean growth rate together 
with its estimated experimental uncertainty (the growth 
rate of each mutant being measured at least four times). 

In order to quantify the effect of the experimental 
uncertainty on the inferred epistatic interactions, we 
constructed a number of mock datasets, assuming that 
the null model without epistatic interactions described 
biology exactly. In these datasets, each single knockout 
had the same growth rate as in the original dataset, and 
each double knockout had a growth rate equal to the 
product of the relative growth rates of the correspond- 
ing single knockouts. We then randomized the mock 
datasets by shifting each growth rate by a random 
amount sampled from a Student's f-distribution, with 
width depending on the corresponding experimental 
uncertainty reported in the original dataset (see Addi- 
tional file 1, Supplementary text 3). As expected, analy- 
sis of these 'noisy' datasets revealed some epistasis, 
clearly caused by our addition of experimental noise 



rather than by any biological mechanism. We found that 
for pairs involving beneficial or neutral mutations, the 
variance computed in the mock datasets was compar- 
able to or even greater than the variance observed in 
the original dataset (Figure 3a, black curves; Figure 3b, 
blue regions). This fact provides an important internal 
control, suggesting that the experimental noise has not 
been underestimated. In spite of this, for pairs of knock- 
outs with substantially deleterious effects, experimental 
noise accounted for less than half of the total observed 
variance, with the rest representing genuine biological 
interactions (Figure 3a, red curves; Figure 3b, red 
regions). 

We then decomposed the variance observed in the origi- 
nal dataset into a contribution produced by experimental 
uncertainty and a contribution of biological origin; the 
strength of epistatic interactions was finally computed as 
the square root of the biological part of the variance. For 
deleterious knockouts, the relative difference between 
epistasis computed from the raw data and from the data 
after subtracting the experimental noise was less than 
30%, emphasizing the significant but not overwhelming 
contribution of experimental noise to the observed varia- 
bility. Figure 2(a-c) represents the 'biological' part of the 
observed epistasis; before subtracting the contribution of 
the experimental uncertainty, the plots are qualitatively 
similar, but quantitatively slightly different (see Additional 
file 1, Figure S4). Importantly, because variances are addi- 
tive, the estimated contribution of the experimental uncer- 
tainty to epistasis is largely independent of the choice of 
the statistical distribution used to model experimental 
uncertainty. In two instances, however, the unknown 
details of the full distribution of experimental noise are 
important; when outlining the distribution of epistatic 
interactions (Figure 2d) and when describing the probabil- 
ity to observe sign epistasis (Figure 4b). In those two fig- 
ures, we plotted the raw data, and did not attempt to 
deconvolve the contribution of experimental uncertainty. 

Comparison between theory and experiment 

The scaling of epistasis observed in the Costanzo et al. 
dataset (Figure 2) is in sharp contrast to the predictions 
of Fisher's geometric model [19], a popular model of 
epistasis in which genetic interactions emerge from geo- 
metry. As we saw, when the effects of the two knock- 
outs are similar (Goi = Gio = G), the variance of 
epistasis is approximately proportional to G. By contrast, 
in the Fisher's model, the variance var(G, G) would 
grow faster than G (Figure 2c; see Additional file 1, 
Supplementary text 2), a much stronger dependence 
than the linear dependence observed experimentally. 

A concrete numerical example can highlight the 
importance of the weaker-than-expected scaling of epis- 
tasis described in this study. Let us consider two gene 
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Figure 3 Experimental noise does not account for all of the observed variance of epistasis (a) Comparison of experimentally measured 
variance (solid lines; shaded areas: 25 to 75% confidence intervals) and variance caused by experimental noise (dashed lines). If one of the two 
mutations is neutral, noise accounts for all of the observed variance (black). When deleterious mutations are combined, noise accounts for less 
than half of the observed variance (red, G 01 = -0.7). (b) Ratio between total observed variance and noise-generated variance as a function of the 
log growth of the knockouts being combined. For deleterious knockouts, the ratio can be significantly greater than 1. 



knockouts, each of which reduces the relative growth 
rate by 5%, from 1.0 to 0.95. According to the multipli- 
cative null model, the growth rate of the double knock- 
out will be approximately 0.95 2 , or approximately 0.90. 
The questions now are: What kind of deviations could 
be expected around 0.90? Would a growth rate of 0.85 
be surprising? What about a growth rate of 0.50?. Let us 
use the analytic fit discussed in the previous section 

got = Sio = °- 95 ' 



Then 

Goi = Gio = log 2 (0.95) = -0.074, 

G u = Goi +G 10 = -0.148, 
and 

cr(Goi,Gio) = 0.076. 

A +/- one standard deviation interval for the growth 
rate of the double knockout is then 




Genotypes -0.3 -0.2 -0.1 0 0.1 

G (log-growth rate) 



Figure 4 Sign epistasis is less likely to occur between mutations with large effects (a) Examples of a smooth landscape with paths of 
monotonically increasing fitness (left) and a rugged landscape characterized by reciprocal sign epistasis (right), (b) Experimentally measured 
probability of observing sign epistasis as a function of the log growth of two single knockouts with similar effects (G 01 = G q0 within 20%, G = 
mean(G oq , G 10 )). The blue shaded area is the standard error of the mean computed by bootstrap. 
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2~ 0.148 - 0.076 2~ 0.148 + 0.076] = [Q 35 Q 95] 

Notice that it is not unlikely that epistasis will cancel 
the effect of the second mutation, so that the growth 
rate of the double knockout mutant is greater than 0.95, 
that is, greater than the growth rate of either of the sin- 
gle knockout mutants. 

Let us now consider two gene knockouts with stron- 
ger effects, each of which reduces the growth rate from 
1.0 to 0.60. Then 

G01 = G10 = log 2 (0.60) = -0.737, 

about 10 times as large as the log growth of the single 
mutants in the previous example. The Fisher's model 
would predict a a(G, G) at least 10 times larger than in 
the previous example (o(G, G)>0.76), and an interval of 
likely growth rates for the double knockout mutants at 
least as large as 

r- 2 - 1.47-0.76 ; 2-1.47+0.76] = [Q ^ g g^Q] , 

Notice how, once again, it is not unlikely that owing 
to genetic interactions, the growth rate of the double 
knockout mutant is greater than 0.60, the growth rate of 
either of the two single knockout mutants. The analytic 
model derived from the experimental data leads to a 
strikingly different conclusion: 

a (G 0 i,Gio) = 0.241, 

and the +/- one standard deviation interval for the 
growth rate of the double knockout becomes 

^ 2 -1.47-0.241 ( 2 -1.47 + 0.241J = [Q 3^ q 425] . 

In this case, a deviation from the null model that is 
greater than three standard deviations would be needed 
for the double knockout mutant to have a growth rate 
greater than that of the single knockout (0.60), making 
the event extremely unlikely. 

Epistasis constrains the evolutionary dynamics 

The previous section provided two examples of recipro- 
cal sign epistasis, realized when two deleterious muta- 
tions produce a double mutant that is fitter than either 
of the two single mutants (Figure 4a). In those cases, a 
fitness valley limits the evolutionary accessibility of the 
fitter double mutant, and only on longer time scales 
may the simultaneous appearance of two mutations 
[20,21] drive a population to the new local fitness maxi- 
mum. In this context, the scaling behavior of epistasis is 
of great importance, because it determines the number 
and the topology of the evolutionarily accessible paths 
[2,22,23], ultimately affecting the possible outcomes of 
the evolutionary process. 



In order to describe how epistasis shapes the naturally 
occurring fitness landscapes, let us consider S(G, G), the 
probability to observe sign epistasis when combining 
two mutations with similar growth rate effects, G. Here, 
S(G, G) depends on the typical interaction strength, 

a (G, G) = Vvar (G, G) . 

In particular, if a(G, G) is proportional to G, then the 
probability of observing sign epistasis is independent of 
G. The Fisher's model implies a super-linear dependence 
of a(G, G) on G, thus predicting a greater probability of 
observing sign epistasis among mutations with strong 
effects. Instead, if the scaling of o(G, G) is proportional 
to VG (Figure 2), then sign epistasis is more likely to 
occur among mutations with small effects (Figure 3b). 
When the relative growth rate effects of the single 
knockouts are small (<2 to 3%), experimental uncer- 
tainty prevents us from pinpointing which pairs of genes 
are epistatic. This does not mean, however, that muta- 
tions with small effects do not interact. Assuming that 
the scaling of epistasis we measured directly for muta- 
tions with intermediate and large effects extends to 
mutations with small effects, a consequence of the 
observed scaling of epistasis is the roughening of the 
local fitness landscape in the proximity of an evolution- 
ary optimum; when the fitness effects of available muta- 
tions become small [24], epistatic interactions become 
increasingly relevant [25,26], reducing the accessibility 
of evolutionary paths and further slowing down the rate 
of adaptation [27,28]. The evolutionary dynamics on 
correlated fitness landscapes [10,29] with the realistic 
correlations described here certainly deserves further 
experimental and theoretical investigation. 

The scaling of genetic interactions may be generic 

To date, our analysis has been limited to interactions 
between entire gene knockouts. Although mutations 
with extreme effects on gene regulation and horizontal 
gene transfer are biologically relevant mechanisms for 
the removal or acquisition of whole genes at once, 
organisms explore possible genetic variants largely 
through the accumulation of single point mutations. 
The Costanzo et al. data et contains thousands of dou- 
ble mutants for which the first mutation is a gene 
knockout and the second mutation consists of one or 
more point mutations in a different gene, causing the 
gene product to misfold in a temperature-sensitive way. 
Although the distribution of growth rate effects for 
point mutations is different than for single gene knock- 
outs (see Additional file 1, Figure S2), the statistics of 
genetic interactions are remarkably similar when com- 
bining two single knockouts and when combining a sin- 
gle knockout with a point mutation (Figure 5). A similar 
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Figure 5 Point mutations have similar epistatic interactions to those of entire gene knockouts (a) Comparison between the variance 
observed in double gene knockout mutants (rainbow dots, same as in Figure 2a) and the variance observed in mixed double mutants 
generated by combining a gene knockout with point mutations in a different gene (black dots), (b) The red curve is the diagonal slice of the 
plot in (a) (G 01 = G 10 within 20%, G = mean(G oq , G q0 )), and the red shaded area is the 25 to 75% confidence interval for the mixed double 
mutant variance. For comparison, the blue curves describe the variance for double gene knockouts as in Figure 2c. As in Figure 2c, the red line 
has equation var(G, G) = 0.079 G. 



scaling is also seen for the epistatic interactions between 
single gene knockouts and decreased abundance by 
mRNA perturbation [30] (DamP) perturbations of a sec- 
ond gene (see Additional file 1 Figure S5). The analysis 
of these hybrid double mutants suggests that the statis- 
tics of the interactions between any two genetic pertur- 
bations are determined only by their growth rate effects 
[31], and not by their biological origin in terms of point 
mutations or gene knockouts. 

A comparison between different definitions of epistasis 

Importantly, any quantitative result on epistasis is a con- 
sequence of how epistasis is defined. Of particular inter- 
est is how strong an epistatic interaction is deemed to 
be, based upon its ranking when compared with that of 
other pairs of mutations. Although the 'traditional' defi- 
nition 

e = gn/goo - (Soi/goo) (gio/goo) 

and the 'geometric' definition 

E = Gn — (Gni + Gin) 

agree about the sets of positive and negative interac- 
tions, they assign different strengths and, more impor- 
tantly, different rankings to the same pair of interacting 
mutations. As an example, if the Costanzo et al. dataset 
is analyzed using the 'traditional' definition of genetic 
interactions, then the linear dependence of var(G, G) on 
G in Figure 2c is replaced by an oddly non-monotonic 
dependence, displaying weaker interactions for pairs of 
genes with either very small or very large fitness effects 
(Figure 6a). As mentioned previously, this decrease in 
the inferred strength of epistatic interactions for very 



deleterious mutations is a mathematical consequence of 
the traditional definition of epistasis, rather than a prop- 
erty of genetic interactions. The same bias would lead 
us to conclude that genes with strong effects on growth 
are almost non-interacting (Figure 6b, red line). How- 
ever, because previous studies have determined that 
essential genes partake in more interactions than do 
non-essential genes [32], it is also reasonable to expect 
that non-lethal genes with large growth effects are 
involved in more interactions than genes with small 
growth effects. Indeed, according to the 'geometric' defi- 
nition of epistasis, the fraction of genes with which a 
gene interacts steadily increases with the growth rate 
effect of the gene (Figure 6b, blue line). By contrast, the 
traditional definition of epistasis, consistently assigns 
low rankings to interactions between genes with large 
growth rate defects, as confirmed by a further analysis 
comparing the two definitions of epistasis against inter- 
actions inferred from the Gene Ontology (GO) database 
[33] (see Additional file 1, Figure S6). According to the 
geometric definition of epistasis, genetic networks [34] 
are denser than expected not only among essential gene 
[32], but also among genes with large growth effects. 

Finally, it is important to emphasize that the tradi- 
tional definition of epistasis remains slightly more suc- 
cessful at discovering the functional relations between 
genes, as cataloged in the GO database (see Additional 
file 1, Figure S6). Part of the reason for this could be 
that some of those functional characterizations were 
suggested by the traditional definition of epistasis in 
the first place. It is certainly true, however, that many of 
the top-ranking interactions according to the geometric 
definition of epistasis involve single and double mutants 
with small growth rates; for those mutants, experimental 
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noise is relatively large, and this may cause a few weakly 
interacting pairs to be incorrectly ranked as strongly 
interacting. It is likely that the experimental protocols 
could be easily adjusted to reduce the relative uncer- 
tainty on the growth rate of especially slow-growing 
mutants to avoid this issue (for example, by allowing for 
a much longer time for growth or by measuring the 
growth rates of additional replicates). 

Conclusions 

We analyzed the growth rates of about five million dou- 
ble mutants in the dataset associated with the work by 
Costanzo et al. We characterized how the strength of 
genetic interactions depends on the growth effects of 
the mutations being combined, and found a weaker 
dependence than that predicted by current theoretical 
models. Although the results were obtained mainly from 
entire gene knockouts, there is some evidence that the 
observed scaling might extend to the interactions 
between single point mutations. The scaling of epistasis 
might or might not be generic [35,36]; important drivers 
could be the harshness of the environment [37], details 
about the evolutionary history [38-40], sexual versus 
asexual reproduction [41] and, perhaps most impor- 
tantly, metabolic [42-45] and genetic complexity [46,47]. 
In general, the experimentally observed scaling suggests 
a previously unexplored class of correlated fitness land- 
scapes with tunable roughness, in which epistasis 
depends explicitly on the effects of the mutations being 
combined. 

A clear limitation of our discussion is that only pair 
interactions were considered. Although high-throughput 
experiments will provide data on higher-order interac- 
tions, a solid understanding of pair interactions remains 



necessary before addressing n-mutation interactions. A 
genuine three-mutation interaction, for instance, should 
be defined as the unexplained deviation from what can 
be computed by combining the effects of all relevant 
mutations and their pair interactions [10,48], perhaps 
using linear fits within the additive null model for log 
growth rates. 

The results we present here were based on a geo- 
metric definition of epistasis. We compared this defini- 
tion with a more standard definition, highlighting the 
desirable mathematical properties of the geometric defi- 
nition and the simple phenomenological relations it 
produces. 

In conclusion, although each epistatic interaction 
between specific genes depends on biological details and 
remains largely unpredictable from first principles, we 
have shown that the statistical properties of an ensemble 
of interactions can display conspicuous regularities, and 
can be described by simple mathematical laws. 

Materials and methods 

The Costanzo et al. dataset is publicly available [49]. 
The file sgadata_costanzo2009_rawdata_101120.txt. gz 
was downloaded on August 17, 2010 and analyzed with 
Mathematica (code available at the Gore laboratory 
website [50]). We restricted our analysis to double 
knockout mutants whose growth rates were positive 
numerical values and for which the growth rates of both 
single mutants were numerical values (see Additional 
file 1, Figure SI). Some genes appear in the dataset both 
as query and array genes; care was taken to avoid dou- 
ble counting. 

The exponentially growing intervals used for the binning 
of the log growth rate effects were defined as [-2 n , -2 n ~ ] 
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for an appropriate range of integer n's. Owing to the rarity 
of extremely deleterious mutations, bins for positive n's 
contained only a few data points, while bins with large 
negative n's were extremely small. In the figures we 
reported only bins for n = -7 to 0, containing log growth 
rate effects ranging from -2° = -1 to -2' 8 = -0.0039 or, 
alternatively, relative growth rate effects ranging from 
T 1 = 0.5 to 2" 0 0039 = 0.997. Different choices for the 
binning sizes and positions did not significantly alter the 
results of the analysis. 

In order to quantify the contribution of experimental 
uncertainty to epistasis, we generated nine randomized 
mock datasets. The mean level of noise-generated epis- 
tasis in these nine datasets is reported in Figure 4 
(dashed lines), and we provide an extensive discussion 
of the choice of Student's f-distributions to generate the 
mock datasets from the original dataset (see Additional 
file 1, Supplementary text 3). 

The GO database go_201207-assocdb-tables.tar.gz was 
downloaded from the GO site [51] on July 19, 2012. 
The MySQL database was queried with Python and ana- 
lyzed Mathematica (code available upon request). 

Additional material 
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